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Abstract 


Many applications of intelligent systems require 
reasoning about the mental states of agents in 
the domain. We may want to reason about 
an agent’s beliefs, including beliefs about other 
agents; we may also want to reason about an 
agent’s preferences, and how his beliefs and pref- 
erences relate to his behavior. We define a prob- 
abilistic epistemic logic (PEL) in which belief 
statements are given a formal semantics, and 
provide an algorithm for asserting and query- 
ing PEL formulas in Bayesian networks. We 
then show how to reason about an agent’s be- 
havior by modeling his decision process as an in- 
fluence diagram and assuming that he behaves 
rationally. PEL can then be used for reasoning 
from an agent’s observed actions to conclusions 
about other aspects of the domain, including un- 
observed domain variables and the agent’s men- 
tal states. 


1 Introduction 


When an intelligent system interacts with other 
agents, it frequently needs to reason about these 
agents’ beliefs and decision-making processes. Exam- 
ples of systems that must perform this kind of reason- 
ing (at least implicitly) include automated e-commerce 
agents, natural language dialogue systems, intelligent 
user interfaces, and expert systems for such domains 
as international relations. A central problem in many 
domains is predicting what other agents will do in the 
future. Since an agent’s decisions are based on its be- 
liefs and preferences, reasoning about mental states 
is essential to making such predictions. An equally 
important task is making inferences about the state of 
the world based on another agent’s beliefs (possibly re- 
vealed through communication) and decisions. Since 
other agents often observe variables that are hidden 
from our intelligent system, their beliefs and decisions 
may provide information about the world that the sys- 
tem cannot obtain by other means. 
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Suppose, for example, that we are developing a sys- 
tem to help analysts and policymakers reason about 
international crises. In one example, based loosely on 
a scenario presented in [3], Iraq purchases weapons- 
grade anthrax (a deadly bacterium) and begins to de- 
velop a missile capable of delivering anthrax to targets 
in the Middle East. There is a vaccine against anthrax 
which the United States is currently administering to 
its troops, but for ethical reasons the U.S. has not done 
controlled studies of the vaccine’s effectiveness. Iraq, 
on the other hand, may have performed such tests. 
Iraq’s purpose in attempting to develop an anthrax- 
equipped missile is to strike U.S. Air Force personnel 
in Turkey or Saudi Arabia, inflicting as many casual- 
ties as possible. However, if Iraq works on developing 
the missile, it must use an old weapons plant that is 
prone to fire; a fire at the plant would be visible to 
U.S. satellites. We would like our intelligent system to 
be able to answer questions like, “If we observe that 
Iraq has purchased anthrax, what is the probability 
that the vaccine is effective?”, and “Does Iraq believe 
(e.g., with probability at least 0.3) that if they begin 
developing an anthrax-carrying missile, the U.S. will 
realize (e.g., believe with probability at least 0.9) that 
they have acquired anthrax?”. 


Efforts to formalize reasoning about beliefs date back 
to Hintikka’s work on epistemic logic [6]. The classical 
form of epistemic logic does not allow us to quantify 
an agent’s uncertainty about a formula; we can only 
say that an agent knows ọ or does not know wy. Prob- 
abilistic logics of knowledge and belief [4, 16] remove 
this limitation. However, evaluating the probability 
that an agent a assigns to a formula ¢ in a model of 
one of these logics requires evaluating at every state 
that a considers possible. As the number of states is 
exponential in the number of domain variables, this 
process is computationally intractable. 


One of the main contributions of this paper is the 
introduction of a probabilistic epistemic logic (PEL) 
that uses Bayesian networks (BNs) [12] as a compact 
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representation for the agents’ beliefs. This framework 
allows us to perform probabilistic epistemic inference 
without enumerating an exponential number of states. 
Our approach is based on the common prior as- 
sumption common in economics [1]. It states that 
the agents have a common prior probability distribu- 
tion over outcomes and their beliefs differ only because 
they have different observations; this assumption al- 
lows us to use a single BN for representing all the 
agents’ beliefs. We describe an implemented algorithm 
for adding nodes to this BN so that it can be used to 
evaluate arbitrary PEL formulas. 


In most domains, agents do not just passively observe 
the world and form beliefs; they also make decisions 
and act. In many existing probabilistic reasoning sys- 
tems (e.g., (2, 13, 7]), a human expert defines the con- 
ditional probability distributions (CPDs) that describe 
how likely an agent is to take each possible action, 
given an instantiation of the variables relevant to the 
agent’s decision process. But this technique relies on 
a human’s understanding of how agents make deci- 
sions, and it may be difficult for a human to perform 
such analysis for a complex model. If we assume an 
agent acts rationally, the intelligent system can use de- 
cision theory to derive the CPDs for the agent’s actions 
automatically. This problem involves subtle strategic 
(game-theoretic) reasoning when multiple agents are 
acting and have uncertainty about each other’s ac- 
tions [5]. In this paper we restrict attention to the 
case where only one agent acts. We model the agent’s 
decision process using an influence diagram (ID) [8], 
then convert this influence diagram into a Bayesian 
network. This extension allows us to use PEL in or- 
der to reason about the decision-maker’s likely course 
of action, and (more interestingly) to use his actions 
to reach conclusions about unobserved aspects of the 
world. We can also extend the framework to reach con- 
clusions about the decision-maker’s preferences, which 
may not be common knowledge. 


2 A Probabilistic Epistemic Logic 


Our probabilistic epistemic logic (PEL) is essentially a 
special case of the logic of knowledge and belief defined 
by Fagin and Halpern [4] (FH hereafter). In PEL, we 
assume that agents have a common prior probability 
distribution over states of the world, and an agent’s 
local probability distribution at state s is equal to this 
global distribution conditioned on the set of states the 
agent considers possible at s. These assumptions are 
not uncontroversial, but we will defer a discussion of 
the alternatives until Section 6. 


The language of PEL is parameterized by a set ® of 
random variable symbols, each with an associated do- 


main; a set A of agents; and a number Na of observa- 
tion stages for each agent a € A. At each of an agent’s 
observation stages, there is a certain set of variables 
whose values the agent has observed. In this paper, 
we will make the perfect recall assumption: agents 
do not forget observations they have made. The values 
of the variables themselves do not change from stage to 
stage (if we want to model an aspect of the world that 
changes over time, we must create separate variables 
for separate times). 


Given these parameters, the language of PEL consists 
of the following: 


e atomic formulas of the form X = v, where X € ® 
and v € dom(X) (the domain of X). Note that 
dom(X) need not be {true, false}; it may be any 
non-empty finite set. 

e formulas of the form ~y and yVy, where y and w 
are PEL formulas; we use y^% as an abbreviation 
for a(7y V ~y). 

e formulas of the form BelCond2; (p| Y), where 
a € ÁA i € {1,..., Na}, y and y are PEL for- 
mulas, and r is a probability in [0,1]. 


Our atomic formulas play the same role as propositions 
in the FH logic. The modal formula BelCond=; (ly) 
should be read as, “according to agent a in stage 2, 
the conditional probability of y given y is at least r”. 
The unconditional belief operator Bel=? (~) is an ab- 


breviation for BelCond2; (y| true). We will provide 
formal semantics for these statements after defining a 
model theory for PEL. Note that the ability to express 
conditional belief statements is not included in the FH 
logic, although their belief statements are more expres- 
sive than ours in allowing probabilities to be related 
by arbitrary linear inequalities. 


Definition 1 A model M of the PEL language having 
random variables ®, agents A and observation process 
lengths {Na}aca is a tuple (S,x,K,P), where: 

e S is a set of possible states of the world; 

e 7 is a value function mapping each random vari- 
able symbol X € ® to a discrete random variable 
X™ (a function from S to dom(X)); 

e K maps each pair in {(a,i) € Ax Zt : i< Na} 
to an accessibility relation Ka, which is an equiv- 
alence relation on S; 

e P is a probability distribution over S. 


Thus, a PEL model specifies a set of states and maps 
each random variable symbol to a random variable de- 
fined on those states. In the rest of the paper, we will 
often refer to a random variable X™ simply as X; it 
should be clear from context whether a random vari- 
able or a random variable symbol is intended. The 
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accessibility relation Ka, holds between worlds that 
are indistinguishable to agent a at stage i. In other 
words, at stage i, if s and s’ are in the same accessibil- 
ity equivalence class, agent a has no information that 
allows him to distinguish between world s and world 
s'. We will use the notation K,i(s) to refer to the 
set of states s’ € S such that K,i(s,s'). With this 
semantics, the perfect recall assumption is formalized 
as a requirement that if ~K,,i(s, s’), then for all j > i, 
we also have that =K4,;(s, 3’). 


P is the agents’ common prior probability distribution 
over the set of states S. For each agent a, stage i, and 
state s, we can derive a local distribution pais over 
the states accessible from s. This local distribution 
is the subjective probability that the agent assigns to 
each accessible state. 


Definition 2 Consider any a E€ A, i € {1,..., Na}, 
ands E€ S. Then for each state s' E€ Ka,i(s), we define: 
Pa,i,s(8’) = P(s' | Ka,i(s)). 


Note that an agent a’s subjective probability distri- 
bution varies from state to state. Thus, other agents’ 
uncertainty about the state of the world can lead to 
uncertainty about a’s beliefs. For example, in some 
states Iraq believes the anthrax vaccine to be effec- 
tive, and in other states it does not; the U.S. may not 
be able to distinguish these two kinds of states. 


The semantics of PEL will be familiar to readers with 
background in modal logic. We introduce a satisfac- 
tion relation F, such that (M,s) F p means the for- 
mula ọ is satisfied at world s in model M. We also 
define an inverse relation [y],, = {s E€ S : s F p}. 


Definition 3 (M,s) F 9 if one of the following holds: 
e p is an atomic formula X =v and X(s) =v. 
e y= aw and (M,s)¥ p. 
e p=WVvx, and (M,s)F 4% or (M,s)F x. 
e p = BelCondž; (| x), Pais([xla) # 0, and 
Pa,i,s([Y] ar | (xl) 2r- 


Note that if there are no states accessible from s that 
satisfy x, then BelCond,,; is defined to be false. 


This definition of satisfaction allows us to evaluate a 
PEL formula at any state s in a given model M. We 
can then use the prior probability distribution P to 
find the total probability of states that satisfy a for- 
mula y. If we do this evaluation directly in the PEL 
model, we need to evaluate ọ at each of |S| states, and 
the size of the state space can be quite large — typ- 
ically exponential in the number of variables. In the 
next section, we present a representation for PEL mod- 
els based on Bayesian networks, and an algorithm that 
uses the independence assumptions encoded in the BN 


to find the probabilities of PEL formulas efficiently. 


Thus, we are proposing an efficient model-checking 
procedure for PEL formulas. We could also provide a 
proof system for PEL; in fact, Fagin and Halpern pro- 
vide a complete axiomatization for their logic. How- 
ever, it is reasonable to assume that an intelligent 
agent will have a complete model representing its own 
belief state, and it is often more efficient to assert and 
query formulas in a model than to attempt to derive 
formulas from a knowledge base (which would need to 
be quite large to completely define the agent’s beliefs). 


3 Representing a PEL model as a BN 


Bayesian networks provide a compact representation 
of a complex probability space. We can augment 
Bayesian networks to provide a compact representa- 
tion of a certain class of PEL models. The basic idea 
is as follows. We define a PEL model M over the set 
of random variables ® using a BN B that has a node 
for each X € 1(®). We define S to be the set of all 
possible assignments x to the variables in 7(®). The 
distribution defined by B specifies the distribution P 
over S. 


To define the accessibility relation Ka, in this frame- 
work, we place the restriction that an agent’s obser- 
vations always correspond to some set of random vari- 
ables: 


Observation Set Assumption: For every a € A 
andi € {1,... , Na}, there is an observation set Oa,i C 
m(®) such that: 


Kaj(s,s’) <> VX € Oai (X(s) = X(s’)) 


Given this assumption, the perfect recall assumption 
is equivalent to the requirement that if j > i, then 
Oai © Oa,j- 


Definition 4 Let M = (S,7,K,P) be a PEL model; 
let B be a BN defining a joint distribution Pr and let 
Oa,i be observation sets consisting of random variables 
appearing in B. We say that M and (B,{O,,}) are 
equivalent if: 

o for every X E€ ®, X is in B; 

e for any instantiation x of m(®), P(z) = Pr(a); 

e for each agent a and stage i, Ka i is related to Oai 

according to the Observation Set assumption. 


We can now use this framework to model the sce- 
nario described in the introduction. The equiva- 
lent Bayesian network is shown in Figure 1. Let i 
stand for Iraq and u stand for the United States. 
We assume that Iraq has a six-stage observation pro- 
cess: O; = {V}; Oi2 = {V, P}; Oi3 = {V,P, B}; 
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Iraq Develops 
Anthrax-Equipped 


Anthrax Vaccine 
Effective (V) 
Missile (M) 
Iraq Purchases 
Anthrax (P) 
Fire at 
Weapons Plant 
(F) 


Figure 1: Basic Bayesian network for the crisis man- 
agement problem. 


Iraq Begins 
Missile (B) 
Development 


Iraq Attacks U.S. 
Air Base (A) 


Oia = {V, P, B, F,M}; Oi,s = {V, P, B, F, M, A}; and 
Oi = {V, P, B, F, M, A,C}. Meanwhile, the U.S. has 
Ou, =9; Ou = {F}; and Ou, = {F, A,C}. 


Before it decides whether to attack the U.S. air base, is 
Iraq quite sure that U.S. casualties will be either high 
or medium? We can answer this question by evaluat- 
ing the formula Bel?g ((C = high) V (C = medium)). 
A more complex query is “Does Iraq believe with 
probability at least 0.3 that if they begin devel- 
oping an anthrax-carrying missile, the U.S. will 
believe with probability at least 0.9 that they 
have acquired anthrax?”. If we fil in the 
stages of the observation processes that are im- 
plicit in this question, we get the PEL formula 


BelCond?* (Belz5° (P = true) | B= true). 


The Observation Set Assumption implies that it is 
common knowledge what variables agent a has ob- 
served at stage i. In many cases, this assumption 
is unrealistic; in our example, the U.S. might be un- 
certain whether Iraq observed the effectiveness of the 
anthrax vaccine at stage 1. As we show, we can han- 
dle such situations without modifying PEL. We simply 
add a new node Observes; (V) to the BN of Figure 1. 
This node is true if Iraq has observed V at stage 1, 
and false otherwise; it can have as parents any nodes 
that are not descendents of V. We also add a node 
Observed Value; ı(V) that has V and Observes;,1(V) as 
parents. Its domain is dam(V)U {unknown}. It takes 
the value unknown if Observes; (V) is false, but has 
the same value as V if Observes; (V) is true. We let 
O; ı contain Observes; (V) and Observed Value; 1 (V), 
but not V itself. 


Under this construction, it is common knowledge that 
Iraq knows at stage 1 whether it has observed V 
at stage 1, and knows what value it has observed. 
However, since the value of Observes; (V) is not 
common knowledge, the U.S. may not know whether 
ObservedValue;,;(V) has the uninformative unknown 
value, or is a copy of V. We have defined this con- 


struction process with an example, but it is clearly 
general enough to model uncertainty about whether 
any variable X is in any observation set Oa,- The 
modified BN and observation sets now define a PEL 
model over a richer set of states than simply the pos- 
sible instantiations of 7(®). 


4 Evaluating PEL Formulas in a BN 


This framework allows us to represent a PEL model 
compactly, but how do we answer queries such as the 
ones shown above? We can use an equivalent BN B to 
find the probability P(X = v) of any atomic formula, 
simply by finding Pr(X = v). We want to extend 
B so that it allows us to compute the probability of 
an arbitrary PEL formula y. To this end, we first 
define an indicator variable 7 [p] which is true if M, s F 
y and false otherwise. We then extend the BN to 
include not only the random variables z(®), but also 
indicator variables for some set A of formulas that 
we may assert or query. Since both 7(@) and all such 
indicator variables are defined on S, the distribution P 
over S defines a joint distribution for 7(®)Un [A]. Our 
goal in constructing the augmented BN is to ensure 
that it defines the same joint distribution. 


Definition 5 Let M = (S,7,K,P) be a PEL model; 
let B be an augmented BN defining a joint distribution 
Pr and let Og, be observation sets. Let A be a set 
of PEL formulas. Then M and (B,{Oq,i}) are A- 
equivalent if M and B are equivalent and: 

e for every p E A, n[y] is in B; 

e for any instantiation z of (x(®) Un[A]), P(z) = 

Pr(z). 


We now present an algorithm that, given a BN that 
is equivalent to a PEL model M, adds indicator vari- 
ables to create an augmented BN that is A-equivalent 
to M, for an arbitrary set of formulas A. The cen- 
tral function of our algorithm is CreateNode(B, p), 
which takes as arguments a BN B and a PEL formula 
p. Its purpose is to create an indicator node for y, 
store it in a global table, and give it the proper condi- 
tional distribution given the other variables in B. 


If there is already a node y[y] in the table, 
CreateNode returns immediately. If p is an atomic 
formula X = v, then the function creates a node n fẹ] 
whose sole parent is X. It defines the CPD of n[y] 
such that 1[y] = true (with probability 1) if X = v, 
and n[y] = false otherwise. 


If p = ay, the function calls CreateNode(B,y). 
Then it creates a node ņ fọ] with one parent, n (z]. It 
defines the CPD of n [p] like a NOT gate: n [p] = true 
iff niy] = false. If p = ~ V x, the function calls 
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Figure 2: Bayes net with indicator variables added. 


CreateNode(B,7) and CreateNode(B, x). Then it 
creates a node 7 [y] with two parents, 7 [y] and 7 [x]. 
In this case, the CPD for 7[y] is like an OR gate: 
n [p] = true iff n [Yp] = true or n [x] = true. 


The interesting case is where y = BelCond2* (w| x). 
As usual, the function begins by calling 
CreateNode(B, 4%) and CreateNode(B,x). Now, 
recall that Oa, is the set of variables whose values 
agent a has observed at stage i. Clearly, whether 
the agent assigns probability at least r to w given x 
depends on what the agent has observed. However, 
it may be that not all the observations are relevant; 
some of the variables in Og, may be d-separated 
from 7[w] given the other observations and 7[y]. 
Using an algorithm such as that of [14], CreateNode 
determines the minimal subset Rel C O,,; of relevant 
observations such that Og; — Rel is d-separated 
from n [4] given Rel U {n [x]}. It then creates a node 
n[y] with the elements of Rel as parents. Next, 
CreateNode sets 7 [x] = true as evidence, and uses 
a BN inference algorithm (e.g., [10]) to obtain a joint 
distribution over 7 [y] and Rel. For each instantiation 
rel of Rel, the function uses the joint distribution to 
calculate Pr(7 [y] | (rel;7 [x] = true)). We then set 
the CPD P(n[y] | rel) to give probability 1 to true 
if Pr(w | (rel;7[x] = true)) > r and probability 1 to 
false otherwise. 


As an example of how this algorithm works, consider 
the formula we discussed earlier involving Iraq’s beliefs 
about U.S. casualties: 


p = Bel?}"* ((C = high) V (C = medium)) 


Calling CreateNode on this formula results in a re- 
cursive call to create a node for ((C = h) V (C = m), 
which in turn calls CreateNode for (C = h) and 
(C = m). There are five random variables in Iraq’s 
observation set at stage 4, but it turns out that only 
two, V (vaccine effective) and M (missile developed), 
are relevant to ((C = h) v (C = m)). To obtain the 
CPD for n[y], we perform BN inference to calculate 
Pr(n [(C = h) V (C = m)] | rel) for each of the four in- 
stantiations rel of {V, M}. In our parameterization of 
the model, it turns out that this probability is > 0.8 
only when rel assigns false to V and true to M. So the 
CPD for 7 [y] specifies true with probability 1 in this 
case, and false with probability 1 in the other three 
cases. The resulting BN is illustrated in Figure 2. 


In proving the correctness of this algorithm, we will 
use the following lemma: 


Lemma 1 Let M be a PEL model, a E€ A, i €E 
{1,..., Na}, and s E S. Let Oai, be the instantia- 
tion of Oa, that s satisfies. Then for any formulas p 
and w: 


Pais (lko) m | [V]a) 

= P(n[p] = true | (0a,i,s;n [p] = true)) 
This lemma puts the criterion for satisfaction of 
BelCond>* (y| w) in a more convenient form. The 
proof, which we do not give here, uses the definition 
of Pa is and the Observation Set assumption. 


Proposition 1 (Correctness of CreateNode) 
Suppose an augmented BN B is A-equivalent to a PEL 
model M. Then when CreateNode(B, y) terminates, 
B is (AU {y})-equivalent to M. 


Proof: We use structural induction on y; the induc- 
tive hypothesis is that Proposition 1 holds for all sub- 
formulas of y. Thus, the recursive calls at the begin- 
ning of CreateNode make it so B is T-equivalent to 
M, where T is A plus all the subformulas of y. Then 
CreateNode(B, p) adds 7 [y] to B. Let Pr be the dis- 
tribution defined by B before this addition, and Pr’ be 
the distribution afterwards. 


By the definition of I-equivalence, we know that for 
every instantiation x of 7(®)Un [I], P(x) = Pr(x). We 
must show that for every instantiation (x; (n [p] = t)) 
of m(®) Un [A] U {n [p]}: 


P((x;n [y] = t)) 


Let pa be the instantiation x limited to the par- 
ents of the newly created node ņ [p]. Then by the 
definition of conditional probability and the chain 


= Pr'((x; 7 [p] = t)) (1) 


394 UNCERTAINTY IN ARTIFICIAL INTELLIGENCE PROCEEDINGS 2000 


rule for Bayes nets, equation (1) is equivalent to: 
P(n [y] = t | x) - P(x) = Pr'(n[y] = t | pa) - Pr'(x). 
CreateNode only adds nodes as children of existing 
nodes, so the marginal distribution over existing nodes 
is not changed. Thus Pr'(x) = Pr(x). Along with the 
fact that P(x) = Pr(x), this allows us to reduce equa- 
tion (1) to: 


P(n [p] = t | x) = Pr'(n[y] = t | pa) (2) 


So what we must show is that the CPDs defined by 
CreateNode are the correct CPDs for the indicator 
variables. The cases where y is an atomic or Boolean 
formula are straightforward, so we move directly to the 
interesting case where y = BelCond2" (wx). 


In this case, CreateNode adds a node 7 [y] with the 
relevant members of Oa, as parents (to simplify the 
presentation, we will assume that all members of O,,i 
are relevant). Suppose x assigns values Oa, to Oa,i- 
Then what we have to prove is: 


P(n [y] =t | x) = Pr'(n [p] = t | oai) (3) 


Consider any state s in which x holds. By Lemma 1 
and the definition of satisfaction, n[y](s) = true if 
and only if: P (n [p] = true | 0a,i,5,7[x] = true) > r. 
But w and x are subformulas of y, and Oui C ®, so 
by the assumption that B is [-equivalent to M: 


P (n [Y] = true | 44,0 [x] = true) 
= Pr(n[] = true | ogi, n[x] = true) 


This last probability value is exactly what 
CreateNode compares to r in constructing the 
CPD for 7[y]. So the CPD is correct. E 


Once we have a BN that is A-equivalent to M we can 
assert any formula y € A by setting 7 [py] = true as ev- 
idence. To find the probability of any formula y € A, 
we simply query 7[y] = true. For example, the for- 
mula yp = Bel?;* ((C =h) V (C = m)) that we dis- 
cussed earlier has probability 0.16 in our model. How- 
ever, if we assert V = false (i.e., the vaccine is ineffec- 
tive), then Pr(~) goes up to 0.8. 


The number of BN queries required to make a BN A- 
equivalent to M is linear in the number of BelCond for- 
mulas, since CreateNode is only called once for each 
subformula. The CreateNode function takes time 
exponential in the maximal number of relevant obser- 
vations for the BelCond subformulas, as we need to 
compute the probability Pr(7 [7] | (rel;7 [x] = true)) 
for every instantiation rel of Rel. Most naively, we 
simply run BN inference for each rel separately. In cer- 
tain cases, we can get improved performance by run- 
ning a single query Pr(7 [y], Rel | [x] = true) and 
then renormalizing appropriately; this approach can 
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Figure 3: Influence diagram representing Iraq’s deci- 
sion scenario. No-forgetting arcs are not shown 


allow us to exploit the dynamic programming of BN 
inference algorithms. We note that the newly added 
nodes also add complexity to the BN, and can make 
the inference cost grow in later parts of the computa- 
tion (e.g., by increasing the size of cliques). 


5 Reasoning about Decisions 


So far, we have assumed that we have a probability 
distribution over all variables in the system. In prac- 
tice, however, we have agents who make decisions in 
accordance with their beliefs and preferences. In our 
example, P, B and A are actually decisions made by 
Iraq. Our construction took these to be random vari- 
ables, each with a CPD representing a distribution 
over Iraq’s decision. If these CPDs are reasonable, 
then our system will give reasonable answers; e.g., we 
will obtain a lower probability that the anthrax vac- 
cine is effective if we observe that Iraq has purchased 
anthrax, since it would not be rational for Iraq to pur- 
chase a bacterium for which the U.S. has an effective 
vaccine. We would like to extend our framework to in- 
duce automatically the actions that agents will take at 
various decision points. As discussed in the introduc- 
tion, this problem is quite complex when there are mul- 
tiple decision makers with conflicting goals. We there- 
fore focus on the case of a single decision maker. We 
note, however, that we can still have multiple agents 
reasoning about the decision maker and about each 
other’s state of knowledge. 


Assuming that agents act rationally, we can automate 
the construction of CPDs for decision nodes by mod- 
eling the decision maker’s decision process with an in- 
fluence diagram, and solving the influence diagram to 
obtain CPDs for the decision nodes. Somewhat sur- 
prisingly, the possibility of modeling other agents with 
influence diagrams has not been explored deeply in the 
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existing literature, although Nilsson and Jensen men- 
tion it in passing [11]. Suryadi and Gmytrasiewicz [15] 
take an approach similar to ours in that they use an ID 
to model another agent’s decision process. However, 
they discuss learning the structure and parameters of 
the ID from observations collected over a large set of 
similar decision situations. We assume that the ID is 
given, and concentrate on the inferences that can be 
made only from a few observations about the current 
situation. 


Figure 3 depicts an influence diagram for the scenario 
described in the introduction. An influence diagram 
is a directed acyclic graph with three kinds of nodes. 
Chance nodes, like nodes in a BN, correspond to ran- 
dom variables; they are represented by ovals. Decision 
nodes, drawn as rectangles, correspond to variables 
that the decision-maker can control. Utility nodes, 
drawn as diamonds, correspond to components of the 
decision-maker’s utility function. 


The decision nodes of an ID are ordered D,,... , Dn 
according to the order in which the decisions are made. 
The parents of D;, denoted Pa(D;), are those variables 
whose value the decision-maker knows when decision 
D; is made. Thus, when we are creating a PEL model 
and an ID for the same scenario, the decision-maker’s 
observation stages correspond to his decision nodes, 
with Oa, equal to Pa(D;). A utility node U; repre- 
sents a deterministic function f; from instantiations of 
Pa(U;) to real numbers. The utility of an outcome is 
the sum of the individual utility functions f;. 


Solving an influence diagram means deriving an opti- 
mal policy, consisting of a decision rule for each deci- 
sion node. A decision rule 6; for a node D; is a function 
from dom(Pa(D;)) to dom(D;). For each instantiation 
of Pa(D;), the decision rule gives the action that max- 
imizes the decision-maker’s expected utility, assuming 
it will act rationally in all future decisions. The stan- 
dard algorithms for solving IDs utilize backwards in- 
duction: the decision rules for the decision nodes are 
calculated in the reverse of their temporal order [9]. 


After we have the decision rules, we can easily trans- 
form an influence diagram D into a Bayesian network 
B(D). We remove the utility nodes, and change the de- 
cision nodes into chance nodes (ordinary BN nodes). 
If D; is a decision node, then for each instantiation pa 
of Pa(D;), we create a probability distribution that 
gives probability 1 to 6;(pa), and probability 0 to all 
other elements of dom(D;). This distribution becomes 
the CPD for D; given pa. 


We can use this system to make inferences about un- 
observed world variables based on evidence of agents’ 
actions. Suppose the parameters of the model D de- 
picted in Figure 3 are such that the prior probability 


of the vaccine being effective is 0.8, but it is irrational 
for Iraq to purchase anthrax unless it has observed the 
vaccine to be ineffective. As above, we may need to 
add some additional nodes to D, such as Observed and 
Observed Value nodes to model the U.S.’s uncertainty 
about whether Iraq observes V at stage 1. We then use 
the method described in this section to derive CPDs 
for Iraq’s decision nodes, creating a BN B(D). The in- 
fluence diagram defines the observation sets for Iraq; 
we will use the U.S. observation sets described in Sec- 
tion 2. We can then use the algorithm of Section 4 to 
find the probabilities of arbitrary PEL formulas in the 
PEL model corresponding to B(D). 


At stage 1, the U.S. assigns probability 0.8 to 


the vaccine being effective: all states satisfy 
Bel2"* (V = true). At stage 2, however, the situa- 
>0.8 


tion changes. It turns out that Bel > (V = true) is 
true if and only if there is not a fire at the Iraqi bi- 
ological weapons plant. A fire provides the U.S. with 
strong evidence that Iraq has begun developing an 
anthrax-carrying missile, which would not be ratio- 
nal unless Iraq had purchased anthrax, which implies 
that Iraq has observed the anthrax vaccine to be in- 
effective. So in this model, Pr(Bel29° (V = true)) = 
Pr(F = false). In a more complex query, we could 
compute Pr(Bel29° (V = true) | V = false), i.e., the 
probability that the U.S. will believe the vaccine to be 
effective despite the fact that it is not. The answer 
to this query would depend on the prior probability 
about the vaccine’s effectiveness, Iraq’s decisions, and 
the chances of observing a fire. 


The CPDs for decision nodes derived by solving an 
influence diagram become part of the common prior 
distribution in the resulting BN. However, these CPDs 
are derived using the decision-maker’s utility function. 
Thus, in assuming that the decision-maker’s decision 
rules are part of the common prior, we are implicitly 
assuming that the decision-maker’s utility function is 
common knowledge. Like the assumption that obser- 
vations are common knowledge, this is an assumption 
we would like to relax. 


Just as we introduced Observes nodes to model un- 
certainty about an agent’s observations, we can in- 
troduce preference nodes to model uncertainty about 
an agent’s utility function. These preference nodes 
are parents of particular utility nodes, and modify the 
way the utility depends on other variables. They are 
also in all the decision-maker’s observation sets, as- 
suming he knows his own preferences. One might pro- 
pose to use continuous-valued preference nodes that 
define a distribution over the decision-maker’s util- 
ity value. The problem with this approach is that 
these continuous-valued preference nodes must be par- 
ents of every decision node, and standard ID solution 
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algorithms cannot handle continuous values in such 
a context. We therefore use discrete-valued prefer- 
ence nodes, with the resulting coarse-grained prefer- 
ence models. For example, we can introduce a node 
A representing Iraq’s aversion to doing business with 
criminal biological weapons dealers, which is a parent 
of the cost node associated with Dı. If A = high, 
then the cost is greater in magnitude than it would be 
if A = low. The preference node A is in the parent 
sets of all Iraq’s decision nodes, but the U.S. will not 
be able to observe it directly. 


6 Discussion and Future Work 


This paper combines epistemic logic with Bayesian 
networks to create an integrated system for probabilis- 
tic reasoning about agents’ beliefs and decisions. Al- 
though PEL is essentially a restricted version of the 
logic presented by Fagin and Halpern, we believe it is 
flexible enough to be useful in many practical applica- 
tions. Furthermore, the simplicity of PEL allows us to 
define an algorithm for finding the probability of a for- 
mula in a PEL model using a Bayesian network, rather 
than constructing the PEL model explicitly. We also 
show how to construct this Bayesian network from an 
influence diagram, rather than having a human fill in 
the CPDs for nodes that represent an agent’s decisions. 


Our approach is limited by the common prior assump- 
tion, which implies that all differences between agent’s 
beliefs are due to their having different observations. 
This assumption is common in economics, and has 
important ramifications [1]. It allows agents’ beliefs 
to be arbitrarily different, as long as they have re- 
ceived sufficiently different observations. But it may 
be impractical to represent in a BN all the different 
observations that have caused agents’ beliefs to di- 
verge. An alternative is to explicitly represent un- 
certainty about each agent’s probability distribution. 
However, this approach introduces substantial com- 
plications: Do we also model one agent’s distribution 
about another agent’s distribution? If so, do we model 
the infinite belief hierarchy? Therefore, the extension 
to this case is far from obvious. Another assumption 
that we would like to relax is that agents are perfect 
probabilistic reasoners and decision makers. 


The other obvious limitation of the system described 
in this paper is that although it can reason about 
the beliefs of an arbitrary number of agents, it can 
only reason explicitly about one agent’s decisions. If 
we wish to have the system automatically derive the 
CPDs for decisions made by multiple agents, the max- 
imum expected utility solution concept is no longer 
appropriate, since the agents do not have probability 
distributions over each other’s actions. We could uti- 


lize game-theoretic solution concepts [5] to find ratio- 
nal strategies for the agents, and then substitute these 
strategies for the agents’ CPDs as we did in Section 5; 
the rest of our results would still be applicable. How- 
ever, the framework of multi-agent rationality is sub- 
stantially more ambiguous than the single agent case, 
so that this approach does not define a unique answer. 
We hope to investigate this issue in future work. 
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